library(knitr) opts_chunk$set(message=FALSE, warning=FALSE, comment="") library(ggplot2)
library(kerasformula) movies <- read.csv("http://s3.amazonaws.com/dcwoods2717/movies.csv") dplyr::glimpse(movies)
sort(table(movies$genre)) out <- kms(genre ~ . -director -title, movies, seed = 12345) plot(out$history) + labs(title = "Classifying Genre", subtitle = "Source data: http://s3.amazonaws.com/dcwoods2717/movies.csv", y="") + theme_minimal()
The classifier does quite well for the top five categories but struggles with rarer ones. Does adding director help?
out <- kms(genre ~ . -title, movies, seed = 12345)
plot(out$history) + labs(title = "Classifying Genre", subtitle = "Source data: http://s3.amazonaws.com/dcwoods2717/movies.csv", y="") + theme_minimal()
Doesn't hurt much but introduces overfitting.... Including only the top directors doesn't make big improvements but doesn't have the overfitting issue.
movies$top50_director <- as.character(movies$director) movies$top50_director[rank(movies$director) > 50] <- "other" out <- kms(genre ~ . -director -title, movies, seed = 12345)
plot(out$history) + labs(title = "Classifying Genre", subtitle = "Source data: http://s3.amazonaws.com/dcwoods2717/movies.csv", y="") + theme_minimal()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.